Predictive value of radiomic features extracted from primary lung adenocarcinoma in forecasting thoracic lymph node metastasis: a systematic review and meta-analysis

Background The application of radiomics in thoracic lymph node metastasis (LNM) of lung adenocarcinoma is increasing, but diagnostic performance of radiomics from primary tumor to predict LNM has not been systematically reviewed. Therefore, this study sought to provide a general overview regarding the methodological quality and diagnostic performance of using radiomic approaches to predict the likelihood of LNM in lung adenocarcinoma. Methods Studies were gathered from literature databases such as PubMed, Embase, the Web of Science Core Collection, and the Cochrane library. The Radiomic Quality Score (RQS) and the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) were both used to assess the quality of each study. The pooled sensitivity, specificity, and area under the curve (AUC) of the best radiomics models in the training and validation cohorts were calculated. Subgroup and meta-regression analyses were also conducted. Results Seventeen studies with 159 to 1202 patients each were enrolled between the years of 2018 to 2022, of which ten studies had sufficient data for the quantitative evaluation. The percentage of RQS was between 11.1% and 44.4% and most of the studies were considered to have a low risk of bias and few applicability concerns in QUADAS-2. Pyradiomics and logistic regression analysis were the most commonly used software and methods for radiomics feature extraction and selection, respectively. In addition, the best prediction models in seventeen studies were mainly based on radiomics features combined with non-radiomics features (semantic features and/or clinical features). The pooled sensitivity, specificity, and AUC of the training cohorts were 0.84 (95% confidence interval (CI) [0.73–0.91]), 0.88 (95% CI [0.81–0.93]), and 0.93(95% CI [0.90–0.95]), respectively. For the validation cohorts, the pooled sensitivity, specificity, and AUC were 0.89 (95% CI [0.82–0.94]), 0.86 (95% CI [0.74–0.93]) and 0.94 (95% CI [0.91–0.96]), respectively. Conclusions Radiomic features based on the primary tumor have the potential to predict preoperative LNM of lung adenocarcinoma. However, radiomics workflow needs to be standardized to better promote the applicability of radiomics. Trial registration CRD42022375712. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-024-03020-x.


Introduction
Lung cancer is currently the second most common cancer in incidence and the leading cause of cancerrelated mortality in the world [1].Adenocarcinoma is the most common histological subtype [2] and lymph node metastasis (LNM) is the main mode of cancer metastasis.Accurate preoperative prediction of LNM is of great significance in the treatment and prognosis prediction of adenocarcinoma [3].Currently, diagnostic methods are classified as either invasive or noninvasive.Invasive procedures such as mediastinoscopic biopsy, ultrasound-guided bronchial needle aspiration or lymph node sampling, which will carry risks of postoperative complications to the patient [4,5].Noninvasive measures on the other hand are commonly the next best test of choice.Radiological studies like computed tomography (CT), magnetic resonance imaging (MRI) and positron emission tomography/computed tomography (PET/CT), have all demonstrated potential diagnostic efficacy in identifying LNM [6,7].Yet, false negative and false positive judgments may be occurred on CT and PET/CT due to some clinical and radiological factors, such as micrometastasis or inflammatory hyperplasia [8,9].While MRI is non-radiation and can offers apparent diffusion coefficient characteristics, motion artifacts would limit its assessment in tumor heterogeneity [7,10].
To improve the efficacy of diagnosis, many studies have relied on radiomics to predict LNM of non-small cell lung cancer [11][12][13].Radiomics is a non-invasive technique which can be applied to traditional imaging modalities to extract and quantify radiomic features [14].Recently, radiomics has already been applied for the identification of malignancy [15] and histological subtypes [16], prediction of gene expression [17], and assessment of treatment response in lung cancer [18].Radiomic features can be extracted from different regions of interest (ROIs) such as the intratumoral and/or peritumoral areas [19][20][21][22].For example, Das SK et al. improved the performance of predicting cT1N0M0 lung adenocarcinoma by combining features of the intratumor region, the peritumoral region and lymph node [23].
With radiomic approaches becoming more common in medical research, it was hypothesized that radiomic features of primary tumor would be instrumental in predicting the possibility of LNM in lung adenocarcinoma.Therefore, the purpose of this review was to provide a general overview of the methodological quality and evaluate diagnostic performance in radiomics for the prediction of LNM in lung adenocarcinoma.

Methods
This systematic review and meta-analysis was reported in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Diagnostic Test Accuracy (PRISMA-DTA) guidelines (Additional file 1: Table S1) and was registered on PROSPERO database for systematic reviews (CRD42022375712) [24].

Database search strategy
A comprehensive search of PubMed, Embase, the Web of Science Core Collection and the Cochrane library was conducted until November 16, 2022.Search terms such as "lung adenocarcinoma", "machine learning", "radiomics", and "lymph node metastasis" were included.The detailed search strategy was described in Table S2 (Additional file 1).No language or publication date restrictions were placed on the initial database search.

Study selection
Studies were selected if they met all inclusion criteria: (1) patients with lung adenocarcinoma confirmed by pathology; (2) articles based on CT/MRI/PET-CT radiomics to evaluate the likelihood of preoperative LNM; (3) the ROI for segmentation contained the primary tumor; (4) articles were published in English.Studies were excluded if they met any of the following exclusion criteria: (1) case studies, editorials, letters, review articles and conference abstracts; (2) studies not in the field of interest.

Data extraction
Two independent investigators firstly extracted the following information from each selected study: (1) study details: first author, publication year, country of origin, study design; (2) patient details: the source of data acquisition, criteria for lymph node staging, diameter and density of primary tumor, diagnostic method of LNM, number of patients and negative/positive LNM in the training/internal validation/external validation cohort, clinical stage; (3) imaging details: imaging modality; (4) radiomic details: segmentation method and software, ROI, radiomic feature extraction software and method, number of radiomic features extracted, type of radiomic features extracted, type of models constructed, the best performance model, number of radiomic/non-radiomic features included in the best performance model; (5)diagnostic performance: sensitivity, specificity and area under the curve (AUC)/concordance index (C-index) of the prediction models.
If more than one predictive model was included in a study, the radiomics model with the highest AUC/Cindex in the training and validation cohort was included in the quantitative evaluation, respectively [25,26].If an internal validation cohort and an external validation cohort were included in a study, we included data from both cohorts.

Risk of bias assessment
The Radiomic Quality Score (RQS) [27] was used to evaluate the procedural validity of each study (Additional file 1: Table S3).The RQS provided rigorous evaluation criteria and reporting guidelines for radiomic studies [27].The total score ranged from -8 to 36, and sixteen items are assigned corresponding scores [27].The Quality Assessment of Diagnostic Accuracy Studies (QUA-DAS-2) [28] was used to determine the risk of bias and the applicability of each included study (Additional file 1: Table S4).The QUADAS-2 tools was first divided into two broad categories: the risk of bias and the applicability concerns [28].The former included features such as patient selection, index test, reference standard, flow and timing [28].The latter examined similar parameters with patient selection, index test and reference standard [28].Based on basic answers of "yes", "no", or "unclear" for each item, the level was rated as "low", "high", or "unclear" [28].The RQS and QUADAS-2 were used to evaluate the quality of the literature independently by two authors.Discrepancies were rediscussed and evaluated to reach a consensus.

Statistical analysis
Firstly, we extracted sample size, sensitivity, and specificity of the best radiomics models in the training and validation cohorts from the studies.Then the number of true positives, false positives, false negatives, and true negatives were calculated by Review Manager 5.4.
Quantitative evaluation was performed using the midas command in Stata 17.0 software.Pooled sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and AUC were calculated, and summary receiver operating characteristic curve (SROC) was created.Heterogeneity was assessed using Cochrane Q-test (two-sides p < 0.05 was considered statistically significant) and I 2 statistic (I 2 values of 25%, 50% and 75% represent low, moderate and high heterogeneity, respectively) from forest plots [29].Spearman rank coefficients was performed to determine whether there was heterogeneity caused by threshold effect.The sources of heterogeneity were further analyzed by subgroup and univariate meta-regression analyses.

Quality assessment
The overall RQS and percent RQS for each study are presented in Table 3 and Fig. 2, along with the scores for the individual components.The median RQS total scores was 14 (range 4 -16) and 38.9% (range 11.1% -44.4%).Most studies (8/17, 47.1%) had RQS scores between 30% and 40% (Fig. 2a).No study scored in the four items of "Cost-effectiveness analysis", "Prospective study" "Biological correlates" and "Imaging at multiple time points" (Fig. 2b).The distribution of the QUADAS-2 scores for each included study was shown in Table S6 (Additional file 1) and Fig. 3.The risk of bias in patient selection was low in 13 studies and unclear in 4 studies.The risk of bias for the index test was low in 10 studies and unclear in 7 studies.The risk of bias for the reference standard test was low in 17 studies.The risk of bias for flow and timing was low in 14 studies, unclear in 2 studies, and high in 1 study.Most studies were assessed as having a low risk of bias and minimal concerns regarding applicability.

Discussion
This study revealed that radiomic features extracted from the primary tumor have the potential to predict preoperative LNM in lung adenocarcinoma.The QUADAS-2 and RQS tools were applied to assess the risk of bias and the quality of the radiomic method.Meta-analysis was used to quantitatively evaluate the diagnostic performance of the best radiomics models.Obviously, the radiomics models achieved satisfactory diagnostic performance in both the training and validation cohorts.However, the low methodological quality of the systematic review and the high heterogeneity of the quantitative meta-analysis suggest that radiomics models still need to be further improved to better assist the clinical practice.
The clinical diagnosis of positive LNM is usually based on imaging findings (e.g., short axis diameter of lymph nodes > 10 mm on CT, maximum standardized uptake value ≥ 2.5 on PET/CT).However, the subjective factors  of manual identification and the limits of the naked eye are highly likely to induce unwanted bias, such as occult LNM [8,9,47,48].Radiomics can directly extract features from the ROIs of macroscopic images (such as primary tumor, peritumoral area, etc.) for quantitative analysis in a high-throughput manner [49].In this review, radiomics studies based on the primary tumor were included.Based on the characteristics of the primary tumor, the severity of tumor hypoxia and angiogenic effects of the primary lesion can be identified to evaluate tumor heterogeneity [50].Cancerous cells within the primary tumor can proliferate by generating new lymphatic vessels in a variety of ways [51] or they can metastasize to the mediastinum through abundant subpleural drainage [37,52].
The RQS was able to assess the quality of the radiomic methods; however, the best score achieved in the included studies was 16 (44.4%)[23,40,41,43].The reason for this result was that 17 studies had a low score in each item of the RQS, which meant that there was a lack of standardized workflow for radiomics research (Table 3).In terms of imaging, all studies documented good image protocol quality and multiple segmentations.However, few studies explored the differences between various scanners and provided open data sources, which will lead to low reproducibility of radiomics research.The Fig. 6 Univariable Meta-regression analysis plot to investigate sources of heterogeneity.(Small Sample Size: sample sizes ≤ 300; Diameter: primary tumor diameter ≤ 30 mm) choice of ROI segmentation method also had a certain effect.The accuracy of manual segmentation is high, but it is limited by time consumption and inter-reader variation.In one study, radiomic features were not included in the best prediction model, likely because only three independent features were selected for analysis due to the small sample size [36].Skewness was incorporated as a radiomics feature in the best prediction models of 5 studies [30,34,35,38,43], and one study found that the skewness of lymph node positive lesions was significantly lower than that of negative lesions [30].Meanwhile, the biological validation of models can facilitate the clinical translation of radiomics.Although two studies combined genes or proteins [44,45], neither of them was statistically significant.Finally, multi-center validation is an important key to reduce overfitting and optimize the model.Therefore, future radiomics studies would be better follow standardized workflows, such as obtaining large and high-quality multi-center datasets, ensuring consistent image acquisition parameters, developing accurate and reproducible segmentation methods, and correlating with genomics or proteomics.
According to the QUADAS-2 results, most studies were of a low risk and had good applicability, which may be due to the inclusion of appropriate patient groups and the selection of gold standards for reference.However, some studies were unclear about the selection of participants and whether the use of gold standards was made uninformed decisions.Thus, future studies are needed to illustrate the exclusion criteria and procedures for patient selection clearly, as well as whether there is an appropriate time interval between the reference standard and imaging examination.
The high heterogeneity of radiomics models in quantitative evaluation cannot be ignored, although they showed good diagnostic performance.We observed whether the primary tumor was ≤ 30 mm as a possible source of heterogeneity in sensitivity.Tumor diameter was also identified as an important predictor among non-radiomic features in this review (Additional file 1: Table S5) [34,35,37,40,43].Similarly, patients with a relatively large primary tumor diameter tend to have a relatively high probability of LNM and poor prognosis [46].Meanwhile, in terms of specificity, imaging modality, sample size and radiomics software were possible sources of heterogeneity.This review mainly included CT-based radiomics models, and its diagnostic performance compared with other imaging modalities (PET or PET/CT) remains to be studied.One of the included studies compared the performance of radiomic prediction models derived from different imaging modalities (CT, PET, or PET/CT) and showed that PET/CT yielded best results than the other [41].Larger sample size will allow for a more comprehensive assessment of a radiomics study, and public database could expand the sample size for the study [53].
Different radiomics feature extraction software was used in this review, which led to the heterogeneity in specificity.One study showed that discrepancies were present in seven different radiomics feature extraction software [54].Therefore, for the differences caused by image acquisition, it is necessary to perform image normalization (such as resampling, etc.) or follow the standardization protocol of image acquisition and reconstruction in further studies [55], which will be of great help to the stability of radiomics feature extraction.In addition, the algorithms and codes of radiomics feature software would be better conform to the image biomarker standardization initiative to improve its reproducibility and verify in multiple cohorts [54].
There were also some limitations in this systematic review.Firstly, almost all the included studies were from China.Therefore, some geographic bias may be present due to the greater prevalence of adenocarcinoma in Asian populations.Secondly, all studies were retrospective, and only three studies used multicenter data.This may lead to selection bias.Third, studies on MRI were not included in this review due to a lack of matching studies.Fourthly, low RQS and high QUADAS-2 results may have some impact on the literature quality assessment.Finally, only 10 of the included articles were used for meta-analysis, and they showed high heterogeneity.Although we found possible sources of heterogeneity, more studies are needed to further explore it in the future.

Conclusions
In conclusion, this review summarized that radiomic features based on the primary tumor have the potential to predict preoperative LNM of lung adenocarcinoma.However, future research needs standardized radiomics workflow such as multi-center and prospective studies to promote the applicability of radiomics.

Fig. 1 1
Fig. 1 Flowchart of the study screening and selection process

a 15mm around the tumor b 5
mm around the tumor c Adjacent pleural regions of interest delineation was defined as two lines tangent to the edges of the tumor, intersecting the visceral pleura at 90°d There are inconsistencies in the data in the original literature eThe best performance model of all the prediction models constructed in the study

Fig. 2
Fig. 2 Qualitative quality assessment evaluated through the Radiomics Quality Score (RQS) tool.a Proportion of studies with different RQS percentage score.b Percentage of the 16 components of the included studies with different scores in the RQS

Table 2
Radiomics workflow for the included studies

of interest No. Radiomic features extracted and extraction software
AVG average CT values of whole tumor, CT computed tomography, FDR false discovery rate, GLCM gray level co-occurrence matrix, GLDM gray level dependence matrix, GLDZM gray level distance zone matrix, GLRLM gray level run length matrix, GLSZM gray level size zone matrix, GPTV gross and peritumoral volume, GTV gross tumor volume, ICC intra-class correlation coefficients, LASSO least absolute shrinkage and selection operator, mRMR minimum-redundancy-maximum-relevance, NA not available, NGLDM neighborhood grey level dependence matrix, NGTDM neighbouring gray tone difference matrix, PCA principle component analysis, PET/CT fluorine-18 fluorodeoxyglucose positron emission tomography/computed tomography, PTV peritumoral volume

Table 3
Radiomic quality scores for all included studies